DIY Exercise 13-1: Create a batch job to process records
Time estimate: 2 hours
Objectives
In this exercise, you create a batch processing system. You will:
· Control batch step processing using batch filters.
· Exchange data between batch steps using variables.
· Trigger batch processing on a schedule.
· Use watermarks to avoid duplicate processing.
· Improve performance by using batch aggregators.
· Improve performance and share data with other Mule applications using VM queues.
Scenario
The Finance department needs to audit certain transactions and needs a Mule application to consistently retrieve data from a database and write these transactions as CSV files to a server.
To meet compliance standards, a CSV file can have no more than 50 records and the Mule application must be deployed to a private server where the Mule application will share the same Mule domain with other financial compliance Mule applications. You do not, however, need to create a new Mule domain project yourself; another developer will be responsible for deploying your project into an existing Mule domain.
Create a project that retrieves new transactions from the database using batch
Create a new Mule application that retrieves data from the flights_transactions table in the database using the following information:
· Host: mudb.learn.mulesoft.com
· Port: 3306
· User: mule
· Password: mule
· Database: training
· Table: flights_transactions
Schedule the main flow to automatically run every 5 seconds. Retrieve new database records based on the value of the primary key field transactionID. Use an ObjectStore to save the maximum transactionID processed for any batch session.
Hint: For test development, limit the query to only retrieve 10 records at a time.
Add a flow to mock the financial compliance application logic
Add a new flow to the Mule application with a VM Listener on the VM queue named validate. Add a Transform Message component to this flow and add DataWeave code to simulate the transactionID validation logic. It expects one record and returns the value true or false, where true indicates that the record needs to be audited. In this simple mock flow, return a Boolean value true if the transactionID is divisible by 4, and false otherwise:
%dw 2.0
output application/java
---
if ( mod(payload.transactionID as Number,4) == 0 )
true
else
false
Send each transaction record to a VM queue for conditional testing
Publish each transaction record to the validate VM queue and wait to consume the Boolean response. Save the result in a variable to filter the current record in the next batch step.
Add batch filters to only process transactions that need auditing
Configure target variables and the accept expression in each batch step to keep track of your records throughout each batch step.
Hint: For test development, you can create a flow that listens on the validate queue path and arbitrarily return true or false for each record processed.
Write out transactions as CSV files
In a second batch step, configure an accept expression to only process this second batch step if the previous VM queue response was true. Inside this batch step, transform the database results to a CSV file and save this CSV file to this Mule application's file system. Use a property placeholder for the file location so the file location can be modified by Ops staff at deployment time. Add a batch aggregator so no more than 50 records at a time are written to each CSV file.
Log the batch processing summary
In the On Complete phase of the Batch Job, log the batch processing summary.
Test your solution
Debug your solution. Step through several polling steps and verify some queries return true from the VM queue and are processed by the second batch step, but other database queries return false and skip the second batch step. Also verify the output CSV files contain at most 50 records each.
Verify your solution
Import the solution /files/module13/audit-mod13-file-write-solution.jar deployable archive file (in the MUFundamentals4.x DIY Files zip that you can download from the Course Resources) and compare your solution.
Going further: Handle errors
Add logic to the first batch step to throw errors.
· Call a flow at the beginning of the first batch step and add event processors to this flow that would sometimes throw an error, but not for every record.
· Experiment with what happens when you handle the error in the flow versus if you don't handle the error in the flow.
· Add a third batch step with an ONLY_FAILURES accept policy to report failed messages to a dead letter VM queue for failed batch steps.
· In the Batch Job scope's general configuration, change the max failed records to 1 and observe the behavior of subsequent batch records after a record throws an error. Change this value to 2 and observe any changes in behavior.
· In the Batch Job scope's general configurations, change the scheduling strategies options to ROUND_ROBIN and observe the behavior, and compare it with the default ORDERED_SEQUENTIAL option's behavior.
· Look at the logs for the On Complete phase to see how many times the same error is reported for each record of the batch job.
· In the first batch step's referenced flow, add a Choice router and a sleep timer that sleeps for a minute. Add logic to the Choice router to only call the sleep timer if the transactionID ends with 6. Observe if later records in the batch job can skip ahead while some records are paused by the sleep timer.
Note: For info about handling batch errors, see: https://blogs.mulesoft.com/dev/mule-dev/handle-errors-batch-job/
Going further: Refactor the validation logic to another Mule application in a new shared Mule domain
Create a Mule domain project named finance, then change the Mule application's Mule domain from default to finance. Move the VM connector global element to the finance Mule domain project.
Move the validation flow to a new Mule application and configure this Mule application to also use the finance Mule domain.
Deploy the Mule domain and both Mule applications to a customer-hosted Mule runtime. Verify batch jobs are still processed correctly.
Going further: Deploy both Mule applications to CloudHub
Instead of using customer-hosted Mule runtimes, configure both Mule applications to use an external online JMS server that is accessible over the public internet. Deploy both Mule applications to CloudHub and verify batch jobs are still processed correctly.